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ABSTRACT 

We  present  a  basic  technique  to  fill-in  missing  parts  of  a 
video  sequence  taken  from  a  static  camera.  Two  impor¬ 
tant  cases  are  considered.  The  first  case  is  concerned  with 
the  removal  of  non- stationary  objects  that  occlude  station¬ 
ary  background.  We  use  a  priority  based  spatio-temporal 
synthesis  scheme  for  inpainting  the  stationary  background. 
The  second  and  more  difficult  case  involves  filling-in  mov¬ 
ing  objects  when  they  are  partially  occluded.  For  this,  we 
propose  a  priority  scheme  to  first  inpaint  the  occluded  mov¬ 
ing  objects  and  then  fill-in  the  remaining  area  with  station¬ 
ary  background  using  the  method  proposed  for  the  first  case. 
We  use  as  input  an  optical-flow  based  mask,  which  tells  if 
an  undamaged  pixel  is  moving  or  is  stationary.  The  mov¬ 
ing  object  is  inpainted  by  copying  patches  from  undamaged 
frames,  and  this  copying  is  independent  of  the  background 
of  the  moving  object  in  either  frame.  This  work  has  applica¬ 
tions  in  a  variety  of  different  areas,  including  video  special 
effects  and  restoration  and  enhancement  of  damaged  videos. 
The  examples  shown  in  the  paper  illustrate  these  ideas. 


1.  INTRODUCTION  AND  OVERVIEW 

Videos  are  an  important  medium  of  communication  and  ex¬ 
pression  in  today’s  world.  Inspite  of  this,  most  of  the  video 
editing  is  done  manually  at  the  expense  of  a  huge  amount  of 
time  and  money.  Hence,  the  problem  of  automatic  restora¬ 
tion  of  old  movies  and  automatic  object  removal  and  edit¬ 
ing  for  video  has  begun  to  attract  the  attention  of  many  re¬ 
searchers.  In  this  paper  we  address  a  constrained  but  impor¬ 
tant  case  of  this  problem.  The  constraint  being  that  the  cam¬ 
era  is  fixed,  and  the  scene  essentially  consists  of  stationary 
background  with  some  moving  foreground,  either  of  which 
may  require  inpainting.  The  moving  object  inpainting  is 
background  independent. 


This  work  was  partially  supported  by  NSF,  ONR,  and  NGA.  We  would 
like  to  thank  Liron  Yatziv  and  Alberto  Bartesaghi  for  interesting  discus¬ 
sions.  KP  would  like  to  thank  Arvind  Menon  for  help  in  preparing  test 
video  sequences. 


1.1.  Basic  Related  Work 

There  has  been  some  preliminary  work  on  frame-by-frame 
PDEs  based  video  inpainting  [1],  following  [2].  PDE  based 
methods  are  mainly  edge-continuing  methods.  In  [1],  the 
PDE  is  applied  spatially,  and  completes  the  video  frame- 
by-frame.  This  does  not  take  into  account  the  temporal  in¬ 
formation  that  a  video  provides. 

The  authors  in  [3]  have  proposed  a  method  for  space- 
time  completion  of  large  damaged  areas  in  a  video  sequence. 
They  pose  the  problem  of  video  completion  as  a  global  op¬ 
timization  problem  with  a  well-defined  objective  function, 
extending  to  space+time  the  pioneering  static  work  in  [4]. 
The  results  shown  are  for  very  low  resolution  videos,  and 
the  inpainted  static  background  was  different  from  one  frame 
to  another  creating  a  ghost  effect.  Significant  over- smoothing 
is  observed  as  well. 

Very  interesting  work  for  repairing  damaged  video  has 
been  recently  reported  in  [5].  Though  the  results  are  very 
impressive  in  difficult  cases,  their  method  involves  a  gamut 
of  different  techniques  making  the  process  of  inpainting  very 
complicated.  A  related  approach  has  been  reported  in  [6] . 
These  works  combine  motion  layer  estimation  and  segmen¬ 
tation  with  warping  and  region  filling-in.  We  seek  a  simpler 
more  fundamental  approach  to  the  problem  of  video  inpaint¬ 
ing. 

1.2.  Overview  of  Our  Work 

Our  approach  is  fundamentally  related  to  the  non-parametric 
sampling  method  proposed  in  [4]  for  the  case  of  2-D  texture 
synthesis.  This  method  was  further  improved  upon  by  using 
a  priority  and  confidence  based  synthesis  in  [7].  We  have 
adapted  and  extended  this  technique  for  video  inpainting. 
The  general  objective  of  our  work  is  to  present  a  simple  and 
fast  method  for  automatically  filling-in  videos.  The  goal  of 
our  algorithm  is  twofold: 

(a)  To  fill-in  the  static  background  while  maintaining  its 
temporal  consistency. 

(b)  To  fill-in  the  moving  foreground  while  keeping  the  mo¬ 
tion  globally  consistent. 

To  achieve  this  we  assume  the  knowledge  (obtained  from 


pre-computed  optical  flow)  of  whether  a  pixel  is  moving  or 
not.  This  gives  us  a  segmentation  of  moving  foreground 
and  static  background.  We  fill-in  the  static  background  by 
first  looking  at  available  temporal  information  in  undam¬ 
aged  frames  and  “copying”  it  to  damaged  frames.  This  leads 
to  a  hole  that  is  common  to  all  the  frames  which  cannot 
be  filled-in  temporally.  A  priority  based  spatial  filling-in 
scheme  is  then  used  to  get  a  best  matching  patch  for  the 
highest  priority  location.  This  best  match  is  then  copied  to 
all  frames.  When  filling-in  moving  foreground,  we  have  de¬ 
vised  a  novel  priority  scheme  that  ensures  completion  of 
the  moving  object  in  each  frame.  Once  the  moving  ob¬ 
ject  is  filled-in,  we  are  left  with  the  simpler  case  of  station¬ 
ary  background  filling,  which  we  have  already  addressed. 
The  subsequent  sections  describe  in  more  detail  our  method 
for  video  completion  and  present  examples  illustrating  the 
ideas. 


2.  INPAINTING  STATIONARY  BACKGROUND 

In  this  section  we  describe  our  method  for  filling-in  missing 
stationary  background  that  is  occluded  by  a  stationary  or 
moving  object. 

We  first  assign  confidence  values  to  each  pixel  in  every 
frame.  The  confidence  of  pixels  which  are  deemed  to  be¬ 
long  to  the  moving  foreground  or  to  the  damaged  area  is 
set  to  zero.  The  rest  of  the  pixels  are  initialized  to  a  con¬ 
fidence  value  of  one.  The  process  of  background  filling  is 
completed  in  two  steps: 

Temporal  Filling-in:  We  search  for  the  highest  priority 
pixel  location  in  the  complete  video  sequence.  This  pri¬ 
ority  computation  is  similar  to  [7],  and  is  described  in  more 
details  later.  Temporal  information  (background  pixels)  is 
copied  from  the  temporally  nearest  undamaged  location  hav¬ 
ing  the  highest  confidence. 

Spatial  Filling-In:  Once  the  temporal  filling  is  over,  we  are 
left  with  a  video  sequence  where  all  frames  have  a  hole  at 
the  same  location.  We  again  find  the  highest  priority  lo¬ 
cation  to  be  filled-in,  and  find  a  best  matching  patch.  This 
patch  is  copied  to  all  the  frames,  so  as  to  maintain  consistent 
background  throughout  the  sequence. 


2.1.  Computing  Confidence  and  Filling-In  Priority 

During  the  temporal  filling-in  step,  priority  of  filling-in  the 
3-D  hole  is  computed  in  a  manner  similar  to  that  in  [7] .  The 
confidence  term  C  ( p ) ,  where  p  is  the  pixel  under  consider¬ 
ation,  is  initialized  to  zero  if  p  is  moving  or  is  damaged  and 
C(p)  is  initialized  to  one  otherwise.  The  second  relevant 
term  is  called  the  data- term  D(p),  and  its  value  is  based  on 
the  availability  of  temporal  information  at  location  p.  The 
data  term  is  computed  as  follows: 


D(p)  = 


S pedn,t=-8n...6n  ^ t(p ) 

P 


(1) 


where  D  indicates  the  hole  to  be  filled  in  ,  dfl  is  its  bound¬ 
ary,  and  Mt  =  0  if  p  is  damaged  or  if  p  is  moving,  else 
Mt  =  1.  The  time  index  t  indicates  the  relative  position 
of  any  frame  from  the  current  frame  (to  which  p  belongs, 
t  =  0).  The  denominator  (3  is  a  normalizing  constant  equal 
to  2n  +  1,  where  n  indicates  the  number  of  previous  and 
next  frames  considered.  Finally  the  priority  of  filling-in  at 
p  G  dfl  is  given  by  : 


P(p)  =  C(p)  *  Dip)  (2) 

This  priority  determines  the  damaged  frame  and  pixel 
location  which  we  need  to  first  fill-in  with  background  in¬ 
formation.  We  then  copy  temporal  information  patches  hav¬ 
ing  the  highest  confidence  value  from  the  temporally  nearest 
frame  to  the  location  p.  Since  we  are  filling-in  temporally, 
and  the  camera  remains  fixed,  we  do  not  need  to  perform 
an  explicit  search.  Such  confidence  based  nearest  neighbor 
copying  is  better  than  copying  directly  from  a  median  im¬ 
age,  the  median  may  not  contain  all  the  information  avail¬ 
able  in  the  temporal  neighborhood. 

Once  we  copy  a  patch  to  the  highest  priority  location 
p ,  the  confidence  at  all  previously  damaged  pixels  in  is 
updated  as  in  [7] 1  : 


C(p) 


E 


qe*Pn(i\n) 


C(q) 


(3) 


where  is  a  patch  centered  at  p,  |  |  is  its  area,  and  / 

denotes  the  video  frame. 

When  D(p)  =  0,  Vp  G  dCl  ,  this  indicates  that  there  is 
no  more  temporal  information  that  can  be  copied.  We  then 
perform  a  priority  based  spatial  filling-in  of  the  hole,  where 
the  data-term  in  the  priority  computation  follows  [7] : 


D(p)  = 


l(VIT)p-nP| 

a 


(4) 


where  I  is  the  grayscale  video  frame,  np  is  the  normal  to 
the  hole  boundary  dfl  at  p  and  a  is  a  normalizing  constant 
(usually  255). 

Important  to  remark  here  is  that  the  best  matching  patch 
found  is  copied  to  all  the  frames,  which  leads  to  a  consistent 
background  throughout  the  video  sequence.  Refer  to  Figure 
3  for  an  example. 


3.  INPAINTING  MOVING  FOREGROUND 

We  now  describe  our  method  for  filling-in  a  moving  object 
that  is  partially  occluded  by  a  stationary  or  moving  second 

^he  C(p)  term  in  Equation  (2)  is  also  computed  using  Equation  (3). 


object,  independent  of  the  changing  background  from  one 
frame  to  another.  The  moving  object  is  filled  in  frame-by- 
frame,  each  frame  being  completed  with  the  following  steps 
(refer  to  Figure  1): 

1.  Find  the  highest  priority  location  for  filling-in  the  moving 
object  in  the  current  frame  (right  of  Figure  la). 

2.  Search  for  the  best  matching  moving  patch  from  the  un¬ 
damaged  portion  of  the  video. 

3.  Copy  only  the  moving  part  of  the  best  matching  patch  to 
the  current  frame,  and  update  the  confidence  values  (Equa¬ 
tion  (3)). 

4.  Constrain  the  priority  so  that  any  pixel  in  the  copied  patch 
that  is  not  moving  gets  zero  priority  (Figure  lc  ). 

5.  Repeat  until  all  damaged  pixels  have  zero  priority. 

The  priority  computation  is  similar  to  that  described  in 
the  previous  section,  but  the  data  term  is  computed  differ¬ 
ently.  In  Equation  (4),  we  substitute  /  by  a  motion  con¬ 
fidence  image  Mc  (which  we  assume  is  given).  Mc(p)  is 
zero  if  p  belongs  to  the  static  background,  and  it  is  one  if  p 
belongs  to  moving  foreground.2 

The  metric  used  for  searching  the  matching  patch  is  a  5- 
D  SSD,  where  the  5  elements  of  the  distance  vector  are  the 
SSD  distances  between  the  R,  G,  B,  Vx  and  Vy  values.  Vx 
and  Vy  are  the  horizontal  and  vertical  velocity  components 
computed  crudely  using  Vx  =  y1  and  Vy  =  y1 ,  where  once 
again,  I  is  the  gray-scale  frame,  It  its  temporal  derivative, 
and  Ix  and  Iy  are  its  spatial  derivatives,  all  implemented 
with  standard  numerical  techniques. 

Once  the  moving  object  is  inpainted,  the  priority  for  all 
the  remaining  pixels  turns  out  to  be  zero  (because  of  the 
constraining  step  4).  The  remaining  part  of  the  hole  can  be 
easily  inpainted  using  the  technique  described  in  the  previ¬ 
ous  section. 

4.  IMPLEMENTATION  AND  RESULTS 

First,  Figure  2  completes  the  moving  red  rectangle  being 
occluded.  This  artificial  example  shows  that  if  we  are  given 
a  good  motion  confidence  image  Mc,  our  method  can  gen¬ 
erate  perfectly  completed  occluded  moving  objects.  In  Fig¬ 
ure  3  we  have  successfully  removed  the  person  wearing  an 
orange  jacket,  the  phone-box  in  the  center,  and  the  lamp- 
post  towards  the  left  of  center.  Observe  that  the  inpainted 
background  is  consistent  throughout  the  video.  Figure  4 
shows  first  the  completion  of  the  moving  person  and  then 
the  background.  Our  algorithm  performs  very  well  even 
when  the  region  to  be  inpainted  is  very  large.  The  algo¬ 
rithm  was  implemented  using  C++,  on  a  P-4  machine  with 
T68MB  of  RAM.  The  complete  sequences  shown  in  figures 
3  (15  frames)  and  4  (70  frames)  took  less  than  5  and  20  mins 

2 In  future  work  we  plan  to  use  optical  flow  with  confidence  values  to 
assign  non-binary  Mc’s. 


respectively.  All  these  and  more  video  results  can  be  viewed 
at  www.  tc.  umn.  edu/~patw0007/icip2005/index.  html 

5.  CONCLUDING  REMARKS 

We  presented  a  simple  method  for  filling-in  video  sequences. 
The  static  background  filled-in  is  consistent  throughout  the 
video.  Moving  object  filling-in  is  independent  of  changing 
background  from  one  frame  to  another  and  also  maintains 
motion  consistency.  We  assumed  that  the  motion  confidence 
image  Mc  is  given  and  showed  that  if  Mc  is  really  good  as  in 
the  synthetic  case,  then  the  moving  object  can  be  inpainted 
perfectly.  We  are  currently  addressing  non-binary  computa¬ 
tions  for  Mc. 

There  are  a  number  of  issues  that  we  have  not  addressed, 
e.g.,  moving  camera.  The  motion  consistency  that  we  achieve 
can  be  further  improved  by  using  a  3-D  search  of  moving 
patches  as  in  [7],  which  would  require  small  patch  sizes  for 
the  search  and  also  help  in  filling-in  completely  occluded 
moving  objects.  Results  in  this  direction  will  be  reported 
elsewhere. 
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(a)  Damaged  frame  on  the  left  and  initial  priorities  indicated 
on  the  right.  Blue  and  white  indicate  high  and  low  priorities 
respectively. 


(b)  Best  matching  frame  on  the  left  and  copied  patch  with  different 
background  shown  on  the  right. 
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(a)  Part  of  the  original  video  sequence. 


(b)  Corresponding  frames  with  the  static  background  filled  in. 


(c)  Enlarged  results  for  frame  1  above. 


Fig.  3.  Static  background  filling. 


(c)  Only  moving  part  of  the  patch  is  copied  (left)  and  the  priority 
is  constrained  (purple  indicates  zero  priority).  After  the  moving 
object  is  completed  there  is  no  damaged  pixel  with  high  priority. 

Fig.  1.  Overview  of  the  moving  object  filling -in  method. 


(a)  Synthetic  sequence  with  an  occluded  moving  red  rectangle. 


(b)  Moving  rectangle  is  filled  in. 

Fig.  2.  Synthetic  example  to  show  that  a  perfect  motion  confidence 
image  leads  to  a  perfect  moving  object  completion. 


(a)  Part  of  the  original  video  sequence. 


(b)  Moving  person  filled  in. 


(c)  Completely  filled  in  sequence. 


(d)  Enlarged  results  for  frame  1  above. 

Fig.  4.  Moving  foreground  +  static  background  filling. 


